OUTLINE

(written 3.17.2025) This is how I imagine this report:

Next step: - These cluster assignments (and over-laps) can then be used to evaluate turn-over. - I think “turnover” will be a separate project

An idea: - A child that defines the “count” and “sum rate”. Possibly import my ppt slides? ??

Looks like I shouldn’t run anything in the master document, but run everything as children.

I’m really going to want to see where the pareto performers are.

COMPARING CLUSTERS

Percent of rate cluster per complex cluster

Rate_cluster

1

2

3

4

5

1

0.0

0.0

41.2

37.4

21.3

2

0.0

0.0

0.3

52.7

47.0

3

0.0

4.2

0.0

59.5

36.3

4

0.0

8.6

0.0

42.9

48.6

5

0.0

38.4

0.0

16.6

45.0

6

0.0

44.2

0.0

25.0

30.8

7

0.0

76.4

0.0

1.2

22.4

8

35.3

62.5

0.0

0.0

2.2

Percent of complex cluster per rate cluster

Complex_cluster

1

2

3

4

5

6

7

8

1

0.0

0.0

0.0

0.0

0.0

0.0

0.0

100.0

2

0.0

0.0

2.0

5.3

14.6

17.4

31.7

29.0

3

98.9

1.1

0.0

0.0

0.0

0.0

0.0

0.0

4

14.2

34.9

20.3

18.8

4.5

7.0

0.4

0.0

5

8.0

30.9

12.2

21.1

12.1

8.5

6.6

0.7

HI PERFORMERS

LOW PERFORMERS

Low Performers

Low performers are in Rates Cluster 1 (<20% win rates by count and sum.)

Strike outs:

These PI’s have never won a proposal. Strike-outs are located in the overlap with Complex Cluster 3 (“Pipe Dreams”).

Sputtering:

These PI’s submit a decent number of proposals with relatively few wins.

They are located in the overlap with Complex Cluster 4 (“Plucky”).

Many at bats, few hits:

These PI’s submit a large number of proposals with relatively few wins.

They are located in the overlap with Complex Cluster 5 (“Prolific”).

“Inefficent” Performers

These PI’s simultaneously carry less than a 20% win rate while also being a Top 20% performer. They contribute to 80% of the total funds won over the last ten years. (Maybe exclude them from the other clusters?)

Notes:

Try showing the boxplots, win and loss, for each of these sub-categories

I need to set the same scale for timeDots I need to modify timeDots according to my new and improved functions I should show the tick marks (timeLine) as well as the timeDots

What about people with a low rate of wins, but have a high total sum? At the bottom right of “Funds requested” graph?

Maybe I have two criteria for low performers – (1) Low win rate (what I have so far) (2) Low total funds requested won (who would this be? I guess I want a pareto already!)

I need to see these by “total funds requested won” per PI

I need to examine the values that have a very low win rate, but have still pulled in enough money to be a significant contributor to the U

I can color the ones in the graph above according to 80th percentile ….?

It would be really cool to replace my background gray grid with transparently changing viridis bands according to the percentiles, or the cumulative percentile or something like that.

I need to include complex cluster 2, it’s just incomplete otherwise.

And maybe we don’t call these “low performers” but we call them “wasted effort” or something like that?

I mean, how can you call someone a low performer when they are in the 90th percentile for funds brought in?

And how can you call someone a low performer when they nabbed 75% of their proposals, but are in the cumulative 10th percentile?

“Low” would be a measure of performance against your own potential, or of your peers’.

I guess I’d want to carefully designate the “low” performers that are above the 80th percentile. That’s a curious sub-set.

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL

APPENDIX

Filtering based on the count of proposals per principal investigator

EXECUTIVE SUMMARY

Using the natural breaks identified in clustering the counts of proposals per principal investigators, 1,264 principal investigators submitting three or fewer proposals are filtered out. 1,672 principal investigators submitting four or more proposals are kept for further clustering.

COMPLEX CLUSTERS BASED ON MULTIPLE DIMENSIONS

SECTIONS

  1. Summary
  2. Method
  3. Clusters verbally described
  4. Clusters visually described
  5. Table of aggregated values by cluster
  6. Hierachical clustering on principal components
  7. Clusters displayed on scatter plots
  8. Variable relevance per cluster
  9. Clusters over time
  10. Cluster populations by college and major institutions

1. SUMMARY

This shows principal investigators sorted into five clusters using multiple criteria.

2. METHOD

First, principal investigators (PI’s) with three or fewer proposals are filtered out. 1,265 principal investigators (43%) are removed and 1,672 principal investigators (57%) are kept.

Second, several variables are calculated per principal investigator:

  • Sum of funds requested (won and lost)
  • Count of proposals (won and lost)
  • Mean and median of funds requested (won and lost)
  • Rate of proposals won (count rate)
  • Rate of funds requested won (sum rate)
  • Total count and sum

Third, variables except rates are scaled with a natural log transformation first, then all variables are centered and scaled.

Fourth, principle components are extracted.

Fifth, hierarchical clustering using Euclidean distance and Ward’s method is used to produce five clusters.

The clusters are named, described verbally and visually, and the cluster populations by colleges and major institutions are shown.

3. CLUSTERS VERBALLY DESCRIBED

CLUSTER 1: PERFECT
Principal investigators in Cluster 1 are perfect in their attempts with 100% win rates of relatively smaller proposals.

A high proportion of PI’s from the School of Business are found here.

It has a population of 65 PIs (3.9%) and accounts for only 1.7% of total requested funds won.

CLUSTER 2: PRECISE
Principal investigators in Cluster 2 don’t have the perfect record of Cluster 1, but win a high proportion of their proposals on both a count and sum basis. They bring in as much as Cluster 4 in total funds requested won through racking up lots of relatively smaller wins.

It has a population of 397 PIs (23.7%) and accounts for 13.3% of total requested funds won.

CLUSTER 3: PIPE DREAMS
Cluster 3 contains principal investigators with zero and near-zero win rates. They ambitiously attempt proposals as large as the principal investigators in Clusters 4 and 5 – just without success.

It has high proportions of PI’s from the colleges of Law, Health, and Social and Behavioral Sciences.

It has a population of 88 PIs (5.3%) and accounts for 0% of total requested funds won.

CLUSTER 4: PLUCKY
Cluster 4 contains principal investigators who mainly differ from Cluster 5 in the count of proposals submitted. They submit equivalently large proposals, just far fewer – less than a third of the proposals submitted by Cluster 5. Some of this is due to longevity (PI’s in Cluster 4 have been submitting bids for a median of 1.86 years compared to 3.7 years in Cluster 5), but some is also due to the rate of submission: PI’s in Cluster 5 submit a median of 30% more bids per year.

Cluster 4 PI’s also slightly lag Cluster 5 PI’s in win rates (count and sum) as well.

It has a population of 558 PIs (33.4%) and accounts for 14.2% of total requested funds won.

CLUSTER 5: PROLIFIC
Cluster 5 contains principal investigators with a prolific record of submission. They submit a large number of proposals – triple Cluster 4, the next highest – with impressive win rates on large proposals. These are the workhorses of University of Utah research.

Principal investigators from the institutions (EGI, CTSI, CVRTI, ICSE, and SCI) tend to show up here.

It has a population of 564 PIs (33.7%) and accounts for a large majority (70.8%) of total requested funds won.

4. CLUSTERS VISUALLY DESCRIBED

5. TABLE OF AGGREGATED VALUES BY CLUSTER

Aggregated Metrics by Complex Cluster

cluster

population

pop_perc

count_total

count_perc

win.sum

sum_perc

win.count

win.mean

win.median

loss.sum

loss.count

loss.mean

loss.median

sum.rate

count.rate

color

1

65

3.9

400

2

84,000,000

2

400

208,000

50,000

0

0

0

0

1.00

1.00

forestgreen

2

397

23.7

3,910

15

678,000,000

13

2,920

232,000

47,000

210,000,000

1,000

210,700

68,200

0.76

0.75

deepskyblue

3

88

5.3

570

2

0

0

0

0

0

408,000,000

560

721,500

413,600

0.00

0.00

goldenrod

4

558

33.4

4,830

18

723,000,000

14

1,500

482,000

138,000

2,762,000,000

3,330

829,700

409,800

0.21

0.31

firebrick

5

564

33.7

16,960

64

3,608,000,000

71

7,230

499,000

91,000

9,848,000,000

9,740

1,011,600

417,900

0.27

0.43

darkslategray

6. HIERARCHICAL CLUSTERING ON PRINCIPAL COMPONENTS

7. CLUSTERS DISPLAYED ON SCATTER PLOTS

8. VARIABLE RELEVANCE PER CLUSTER

9. CLUSTERS OVER TIME

Although this clustering did not include any time variables, the submission patterns by cluster are shown.

10. CLUSTER POPULATIONS BY COLLEGE AND MAJOR INSTITUTIONS

259 (15.5)% PI’s appear in multiple colleges or institutions.

Percent of organization population per cluster

Org

1

2

3

4

5

Arch

0.0

41.2

0.0

47.1

11.8

Bus

33.3

46.7

0.0

13.3

6.7

CTSI

0.0

16.7

8.3

25.0

50.0

CVRTI

0.0

11.1

5.6

22.2

61.1

Dent

0.0

11.1

0.0

55.6

33.3

Educ

6.9

20.7

10.3

55.2

6.9

EGI

10.5

21.1

10.5

15.8

42.1

Engr

0.4

6.8

4.3

34.9

53.6

FinArt

0.0

44.4

0.0

44.4

11.1

Health

3.6

21.4

14.3

39.3

21.4

Hum

6.2

43.8

6.2

25.0

18.8

Hunt

0.6

28.2

2.6

27.6

41.0

ICSE

10.0

10.0

0.0

30.0

50.0

Law

0.0

40.0

20.0

40.0

0.0

Med

4.0

28.1

4.4

27.2

36.4

Nurs

5.7

8.6

11.4

45.7

28.6

other

6.4

25.6

2.6

21.8

43.6

Pharm

0.0

19.4

3.2

35.5

41.9

SCI

5.4

16.2

0.0

29.7

48.6

Science

3.7

19.4

4.6

41.0

31.3

SocBeh

3.8

20.3

13.9

41.8

20.3

SocWrk

4.3

43.5

4.3

39.1

8.7

Tran

0.0

50.0

0.0

50.0

0.0

Percent of cluster population per organization

Org

Arch

Bus

CTSI

CVRTI

Dent

Educ

EGI

Engr

FinArt

Health

Hum

Hunt

ICSE

Law

Med

Nurs

other

Pharm

SCI

Science

SocBeh

SocWrk

Tran

1

0.0

7.5

0.0

0.0

0.0

3.0

3.0

1.5

0.0

3.0

1.5

1.5

1.5

0.0

46.3

3.0

7.5

0.0

3.0

11.9

4.5

1.5

0.0

2

1.6

1.6

0.9

0.4

0.2

1.3

0.9

3.6

0.9

2.7

1.6

9.9

0.2

0.4

49.1

0.7

4.5

2.7

1.3

9.4

3.6

2.2

0.2

3

0.0

0.0

2.1

1.0

0.0

3.1

2.1

10.4

0.0

8.3

1.0

4.2

0.0

1.0

35.4

4.2

2.1

2.1

0.0

10.4

11.5

1.0

0.0

4

1.3

0.3

1.0

0.7

0.8

2.6

0.5

13.4

0.7

3.6

0.7

7.0

0.5

0.3

34.5

2.6

2.8

3.6

1.8

14.5

5.4

1.5

0.2

5

0.3

0.1

1.7

1.6

0.4

0.3

1.1

17.8

0.1

1.7

0.4

9.0

0.7

0.0

40.1

1.4

4.8

3.7

2.5

9.6

2.3

0.3

0.0

SIMPLE CLUSTERS BASED ON WIN RATES

SECTIONS

  1. Summary
  2. Method
  3. Clusters verbally described
  4. Clusters visually described
  5. Table of aggregated values by cluster
  6. Hierachical clustering on principal components
  7. Clusters displayed on scatter plots
  8. Variable relevance per cluster
  9. Clusters over time
  10. Cluster populations by college and major institutions

1. SUMMARY

This shows principal investigators sorted into eight clusters using only two criteria: the win rates by count of proposal and sum of funds requested.

2. METHOD

First, principal investigators (PI’s) with three or fewer proposals are filtered out. 1,265 principal investigators (43%) are removed and 1,672 principal investigators (57%) are kept.

Second, two variables are calculated per principal investigator:

  • Rate of proposals won (count rate)
  • Rate of funds requested won (sum rate)

Third, the rates are centered and scaled.

Fourth, principle components are extracted.

Fifth, hierarchical clustering using Euclidean distance and Ward’s method is used to produce eight clusters.

The clusters are named, described verbally and visually, and the cluster populations by colleges and major institutions are shown.

3. CLUSTERS VERBALLY DESCRIBED

CLUSTER 1: ROCK BOTTOM
Principal investigators in Cluster 1 have less than ~20% win rates by both count and sum.

It has a population of 211 PIs (12.6%) and accounts for only 2.1% of total requested funds won.

CLUSTER 2: DOMINANT
Principal investigators in Cluster 2 have less than ~40% win rates by both count and sum.

It has the largest share of the population and proposals submitted, as well as a large share of funds awarded.

It has a population of 370 PIs (22.1%) and accounts for 16.6% of total requested funds won.

CLUSTER 3: BIG WHIFF
Principal investigators in Cluster 3 have less than ~30% win rates by sum and less than ~70% win rate by count.

They have a respectable win rate by count but lose the larger proposals.

Cluster 3 has a population of 190 PIs (11.4%) and accounts for 5.5% of total requested funds won.

CLUSTER 4: MONEY
Principal investigators in Cluster 4 have less than ~60% win rates by sum and less than ~50% win rates by count.

It has the largest share of funds requested won.

Cluster 4 has a population of 245 PIs (14.7%) and accounts for 25.5% of total requested funds won.

CLUSTER 5: MISSING THE BIG WINS
Principal investigators in Cluster 5 have less than ~50% win rates by sum and greater than ~50% win rates by count.

They have an excellent win rate by count but like Cluster 3, they lose the larger proposals.

Cluster 5 has a population of 151 PIs (9%) and accounts for a large majority (9.9%) of total requested funds won.

CLUSTER 6: BRONZE MEDAL

Principal investigators in Cluster 6 have greater than ~40% win rates by sum and less than ~65% win rates by count.

Their win rates by sum are in third place behind Clusters 7 and 8. And compared to Clusters 3 and 5, they win the large proposals.

Cluster 6 has a population of 156 PIs (9.3%) and accounts for a large majority (17.1%) of total requested funds won.

CLUSTER 7: SILVER MEDAL
Principal investigators in Cluster 7 have greater than ~50% win rates by sum and greater than ~60% win rates by count. They have the second-highest win rates.

Cluster 7 has a population of 165 PIs (9.9%) and accounts for a large majority (14.1%) of total requested funds won.

CLUSTER 8: GOLD MEDAL
Principal investigators in Cluster 8 have greater than ~80% win rates by sum and greater than ~65% win rates by count. They have the highest win rates.

Cluster 8 has a population of 184 PIs (11%) and accounts for a large majority (9.3%) of total requested funds won.

4. CLUSTERS VISUALLY DESCRIBED

5. TABLE OF AGGREGATED VALUES BY CLUSTER

Aggregated Metrics by Rate Cluster

cluster

population

pop_perc

count_total

count_perc

win.sum

sum_perc

win.count

win.mean

win.median

loss.sum

loss.count

loss.mean

loss.median

sum.rate

count.rate

color

1

211

12.6

2,780

10

106,000,000

2

250

420,000

152,000

2,384,000,000

2,530

943,500

422,600

0.04

0.09

peru

2

370

22.1

6,890

26

845,000,000

17

1,770

479,000

160,000

5,025,000,000

5,120

981,100

419,400

0.14

0.26

darkseagreen

3

190

11.4

2,820

11

278,000,000

6

1,350

206,000

50,000

1,721,000,000

1,470

1,169,100

412,800

0.14

0.48

darkcyan

4

245

14.7

4,180

16

1,297,000,000

26

1,580

821,000

250,000

2,130,000,000

2,600

819,400

360,300

0.38

0.38

slateblue

5

151

9.0

3,300

12

505,000,000

10

2,270

223,000

38,000

1,127,000,000

1,030

1,090,200

295,800

0.31

0.69

goldenrod

6

156

9.3

1,960

7

870,000,000

17

990

883,000

223,000

472,000,000

970

486,200

187,200

0.65

0.50

indianred

7

165

9.9

2,880

11

719,000,000

14

2,220

324,000

53,000

340,000,000

660

514,900

130,000

0.68

0.77

mediumaquamarine

8

184

11.0

1,870

7

474,000,000

9

1,630

290,000

55,000

27,000,000

240

114,300

38,900

0.95

0.87

firebrick

6. HIERARCHICAL CLUSTERING ON PRINCIPAL COMPONENTS

7. CLUSTERS DISPLAYED ON SCATTER PLOTS

8. VARIABLE RELEVANCE PER CLUSTER

This section is omitted due to the few variables used in clustering. Both variables are relevant to all clusters.

9. CLUSTERS OVER TIME

Although this clustering did not include any time variables, the submission patterns by cluster are shown.

10. CLUSTER POPULATIONS BY COLLEGE AND MAJOR INSTITUTIONS

259 (15.5)% PI’s appear in multiple colleges or institutions.

Percent of organization population per cluster

Org

1

2

3

4

5

6

7

8

Arch

0.0

29.4

17.6

29.4

11.8

0.0

5.9

5.9

Bus

6.7

6.7

6.7

6.7

0.0

13.3

0.0

60.0

CTSI

12.5

8.3

20.8

12.5

4.2

20.8

16.7

4.2

CVRTI

5.6

33.3

11.1

22.2

11.1

5.6

5.6

5.6

Dent

11.1

44.4

22.2

0.0

0.0

11.1

11.1

0.0

Educ

17.2

20.7

10.3

17.2

3.4

3.4

13.8

13.8

EGI

10.5

15.8

21.1

15.8

10.5

0.0

0.0

26.3

Engr

23.0

39.1

9.8

11.9

5.5

5.5

3.4

1.7

FinArt

0.0

22.2

22.2

0.0

0.0

22.2

0.0

33.3

Health

17.9

33.9

1.8

19.6

3.6

8.9

5.4

8.9

Hum

6.2

25.0

6.2

12.5

0.0

6.2

18.8

25.0

Hunt

7.7

28.2

13.5

16.0

11.5

5.8

10.9

6.4

ICSE

0.0

30.0

30.0

20.0

10.0

0.0

0.0

10.0

Law

20.0

0.0

40.0

20.0

0.0

0.0

20.0

0.0

Med

10.0

17.3

12.4

13.7

11.5

9.0

13.7

12.3

Nurs

17.1

22.9

22.9

17.1

8.6

2.9

2.9

5.7

other

6.4

17.9

14.1

12.8

14.1

17.9

6.4

10.3

Pharm

12.9

35.5

6.5

11.3

11.3

6.5

6.5

9.7

SCI

2.7

16.2

13.5

24.3

10.8

13.5

5.4

13.5

Science

12.0

23.0

9.7

15.2

6.5

14.3

9.7

9.7

SocBeh

16.5

24.1

8.9

21.5

5.1

8.9

3.8

11.4

SocWrk

8.7

17.4

4.3

17.4

4.3

13.0

17.4

17.4

Tran

0.0

50.0

0.0

0.0

0.0

0.0

50.0

0.0

Percent of cluster population per organization

Org

Arch

Bus

CTSI

CVRTI

Dent

Educ

EGI

Engr

FinArt

Health

Hum

Hunt

ICSE

Law

Med

Nurs

other

Pharm

SCI

Science

SocBeh

SocWrk

Tran

1

0.0

0.4

1.3

0.4

0.4

2.2

0.9

23.5

0.0

4.3

0.4

5.2

0.0

0.4

33.9

2.6

2.2

3.5

0.4

11.3

5.7

0.9

0.0

2

1.1

0.2

0.4

1.3

0.9

1.3

0.7

20.4

0.4

4.2

0.9

9.8

0.7

0.0

30.0

1.8

3.1

4.9

1.3

11.1

4.2

0.9

0.2

3

1.3

0.4

2.2

0.9

0.9

1.3

1.8

10.1

0.9

0.4

0.4

9.3

1.3

0.9

42.7

3.5

4.8

1.8

2.2

9.3

3.1

0.4

0.0

4

1.8

0.4

1.1

1.4

0.0

1.8

1.1

9.9

0.0

3.9

0.7

8.8

0.7

0.4

37.8

2.1

3.5

2.5

3.2

11.7

6.0

1.4

0.0

5

1.1

0.0

0.6

1.1

0.0

0.6

1.1

7.4

0.0

1.1

0.0

10.2

0.6

0.0

51.1

1.7

6.2

4.0

2.3

8.0

2.3

0.6

0.0

6

0.0

1.1

2.9

0.6

0.6

0.6

0.0

7.4

1.1

2.9

0.6

5.1

0.0

0.0

40.0

0.6

8.0

2.3

2.9

17.7

4.0

1.7

0.0

7

0.5

0.0

2.1

0.5

0.5

2.1

0.0

4.2

0.0

1.6

1.6

8.9

0.0

0.5

56.0

0.5

2.6

2.1

1.0

11.0

1.6

2.1

0.5

8

0.5

4.5

0.5

0.5

0.0

2.0

2.5

2.0

1.5

2.5

2.0

5.0

0.5

0.0

48.2

1.0

4.0

3.0

2.5

10.6

4.5

2.0

0.0